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I.  INTRODUCTION 


The  U.S.  Navy's  Impact  acceleration  research  program  being 
conducted  by  the  Naval  Aerospace  Medical  Research  Laboratory  (NAMRL) 
Detachment  Is  accumulating  an  extensive  data  base  on  dynamic  response, 
for  both  humans  and  subhuman  primates,  under  stringently  controlled 
experimental  conditions.  This  wealth  of  empirical  data  offers  the 
possibility  of  developing  a statistical  model  of  head/neck  Impact 
acceleration  Injury  based  primarily  on  Information  obtained  from  the 
data  base.  The  framework  of  such  a model  has  been  discussed  In  a 
previous  technical  report  [2].  The  following  sections  of  the  current 
report  address  the  topic  of  estimation  accuracy  when  this  type  of 
empirically-based  model  Is  used. 

The  NAMRL  research  Is  focused  on  the  head/neck  system,  which  Is 
the  most  vulnerable  body  segment  In  terms  of  Impact  acceleration  Injury. 
Because  the  NAMRL  data  base  Is  comprised  primarily  of  head  and  neck 
dynamic  response  data,  the  discussion  In  this  report  will  be  restricted 
to  consideration  of  Injury  In  that  body  segment.  However,  the  general 
procedures  proposed  for  model  construction  should  be  adaptable  to  Injury 
prediction  models  for  other  body  segments. 


II.  BACKGROUND 


Most  head  Injury  models,  of  necessity,  are  mathematical  models 
based  on  a number  of  underlying  assumptions  about  head/neck  movement, 
forces,  and  overall  Injury  mechanisms.  The  existence  of  a well- 
controlled  data  base  from  animal  and  human  Impact  acceleration 
experiments  permits  consideration  of  a probabilistic  Injury  model 
derived  from  empirical  data  embedded  In  a statistical  framework.  This 
type  of  model,  unlike  a standard  mathematical  model.  Is  based  primarily 
on  Information  contained  In  observed  data,  rather  than  on  that  derived 
from  theoretical  assumptions  about  the  mechanical  structure  and  dynamics 
of  the  head/neck  segment.  Thus,  the  modeling  approach  discussed  In 
this  report  should  offer  new  Insights  Into  the  Injury  prediction  problem 
by  complementing  those  approaches  usually  used. 

A.  PROBLEM  DISCUSSION 

Consider  the  Impact  acceleration  situation  In  which  the  torso  Is 
well-restrained,  but  the  head  and  neck  are  unrestrained.  In  this 
situation  the  problem  Is  one  of  predicting  whether  a human  of  given 
anthropometric  characteristics  will  sustain  Injury  If  exposed  to  Impact 
accelerat:?  on  which  results  In  given  dynamic  response  of  the  head/neck 
system.  A number  of  difficulties  must  be  overcome  to  develop  an  Injury 
prediction  model  from  empirical  data.  If  enough  Instrumented  human 
subjects  were  available,  and  could  be  subjected  to  various  acceleration 


time  traces,  a reasonable  prediction  model  would  eventually  result.  Of 
course,  this  procedure  is  not  possible — human  subjects  cannot  be  pur- 
posely injured. 

Because  experiments  involving  humans  cannot  be  planned  for  poten- 
tially injurious  regions  of  the  data  space,  any  empirical  data  gathered 
in  those  regions  must  be  from  human  analogs  (for  example,  subhuman 
primates).  The  results  must  then  be  scaled  or  extrapolated  to  humans. 

This  topic  will  not  be  discussed  in  this  report. 

In  any  event,  it  must  be  realized  that  the  situation  is  not  deter- 
ministic. For  example,  even  with  a restrained  torso,  the  same  Impact 
acceleration  will  result  in  different  head/neck  response  (for  example, 
because  of  initial  head  position),  and  even  apparently  identical  head 
response  for  the  same  person  may  result  in  Injury  sometimes  and  not  at 
other  times.  This  binomial  aspect  of  Injury  occurrence  defines  a discrete 
random  variable  which  must  be  considered.  To  further  complicate  matters, 
the  acceleration  and  d3mamlc  response  data  under  consideration  is  time 
trace  data. 

Despite  the  problems  mentioned  here,  it  should  be  noted  that,  in 
general,  construction  of  a model  is  relatively  easy.  It  is  the  validation 
of  that  model  which  is  difficult.  Validation  may  be  defined  as  establish- 
ing acceptable  agreement  between  model  predictions  and  observed  data. 


B.  MODEL  ASSUMPTIONS  AND  FRAMEWORK 

To  construct  any  Injury  prediction  model,  some  assumptions  must  be 
made.  The  trick  is  to  make  assumptions  which  are  at  least  approximately 


correct.  Hopefully,  this  Is  true  of  the  assumptions  made  In  this 
section. 

Because  dealing  with  the  complete  acceleration  time  traces  of  the 
head  Is  an  Impossible  analytic  task,  a set  of  univariate  head  dynamic 
response  variables  which  may  be  expected  to  be  related  to  Injury  will 
be  considered.  Likely  candidates  Include,  for  example,  linear  and 
angular  velocities  and  accelerations  (average  or  peak).  Although  a 
number  of  anthropometric  variables  can  also  be  postulated.  It  Is 
probably  reasonable  to  assume  that  their  effect  within  a species  will 
be  minor  when  compared  to  that  of  the  dynamic  response  variables.  Thus, 

It  Is  suggested  that  only  these  latter  variables  be  considered  In  Initial 
model  development.  At  a later  stage,  the  anthropometric  variables  should 
prove  Important  In  scaling.  (See  the  discussion  In  [2].) 

Exactly  which  variables  to  Include  In  a model  will  not  be  discussed 
here.  If  the  total  set  of  potential  variables  Is  large,  some  preliminary 
screening  will,  of  course,  have  to  be  done.  Should  Important  variables 
be  excluded  from  consideration,  this  should  become  apparent  as  model 
development  proceeds.  If  all  experimental  data  Is  saved,  the  primary 
penalty  Imposed  by  such  an  occurrence  would  be  the  requirement  for  addi- 
tional analysis  time. 

In  general,  then,  a set  of  k variables,  which  will  be  denoted  by 
X * (x^, . . . ,Xj^) , Is  being  considered.  It  Is  postulated  that  the  probability 
of  Injury  Is  some  (unknown)  function  of  these  variables.  Furthermore, 
although  the  function  Is  unknown.  It  will  be  near  zero  In  part  of  x-space, 
near  one  In  another  part,  and  will  Increase  from  near  zero  to  near  one 
over  an  Intermediate  part.  Experimentally  what  Is  observed  In  a given 


situation  is  only  an  estimated  value  (either  0 or  1)  of  the  true 
probability. 

In  summary,  the  probability  of  injury  is  being  considered  as  a 
function  of  3c.  Thus,  this  probability  may  be  denoted  by: 


P = P(x)  » P(Xj^,...,X^). 


Furthermore,  the  observed  value  of  P will  be  denoted  by  y,  where: 


y 


1 if  an  injury  is  sustained 
0 if  no  Injury  is  sustained. 


V. 

It  will  be  assumed  that  a logistic  function  provides  a reasonable 
approximation  to  the  function  defining  probability.  The  logistic 
function  is  given  by: 


k -1 

P(x)  = {l  + expI-(3Q  + Eg^x^)]} 


(1) 


When  this  function  is  used,  all  predicted  probabilities  are  restricted 
to  the  range  (0,1).  Furthermore,  this  function  satisfies  the  conditions 
of  being  near  zero  in  a part  of  x~space,  near  one  in  another  part,  and 
increasing  from  near  zero  to  near  one  over  an  intermediate  part.  It 
is  also  tractable  computationally. 

Figure  1 Illustrates  a representative  approximating  probability 
prediction  function  for  two  x variables.  As  can  be  seen,  the  predicted 
probability  is  near  zero  for  x^  small  and  x^  small,  near  one  for  x^  large 
and  x^  large,  and  intermediate  in  other  regions.  The  logistic  function 
can  represent  a variety  of  shapes  by  adjusting  the  coefficient  values. 
Thus,  it  is  quite  flexible. 


III.  EVALUATION  OF  ESTIMATION  ACCURACY 


Vi 


From  a set  of  observed  data,  the  coefficients  (l.e.,  Bq, 
of  the  logistic  model  may  be  estimated.  The  estimation  process  is 
fairly  complex,  involving  an  Iterative  procedure  which  provides  the 
maximum  likelihood  estimates.  This  does  not  pose  an  insurmountable 
problem,  however,  since  the  computer  is  available. 

Nonetheless,  the  data  input  to  a model  of  this  kind  is  of  necessity 
dichotomous,  requiring  the  use  of  larger  samples  than  required  to  obtain 
a desired  degree  of  predictive  accuracy  if  the  data  were  continuous. 
Thus,  it  is  of  central  concern  to  investigate  the  degree  of  accuracy 
which  may  be  expected  for  predictions  derived  from  the  model,  and  to 
examine  the  sensitivity  of  such  predictions  to  sample  size. 


A.  EVALUATION  PROCEDURE 

A Monte  Carlo  simulation  study  was  undertaken  to  provide  informa- 
tion relating  accuracy  to  sample  size  for  selected  model  configurations. 
Two  specific  sets  of  model  parameters  were  considered,  Monte  Carlo 
samples  of  various  sizes  were  generated  for  each,  and  the  accuracy  of 
the  resultant  predictions  were  evaluated  with  respect  to  the  true  prob- 
abilities. 

Two  models  were  considered,  each  employing  six  variables  (x^,...,xg), 
which  were  allowed  to  take  on  values  in  the  Interval  (-1,1).  These 
models,  which  are  hereafter  referred  to  as  Model  A and  Model  B,  differed 
only  in  the  value  of  the  parameter  3q  . For  Model  A,  6q  * 0,  while  for 
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Model  B,  3q  - -2. 

The  remaining  six  coefficients  were  assigned  the  same  values  In 
both  models: 

" -0.25,  $2  =•  0-50,  83  “ -0.75,  3^  - 1.00,  35  - -1.25,  3g  - 1.50  . 

These  particular  coefficient  values  were  chosen  to  produce  models  with 
certain  properties.  Specifically,  these  models  are  such  that  the 
minimum  attainable  probability  over  the  jc  region  Is  near  zero,  while  the 
maximum  Is  near  one.  In  addition,  while  the  average  probability  for 
Model  A Is  moderately  high  (.500)  over  the  x region,  the  average  prob- 
ability for  Model  B Is  relatively  low  (.225).  Thus,  the  observations 
generated  by  Model  A would.  In  general,  consist  of  more  values  of  y = 1 
than  would  Model  B. 

Monte  Carlo  procedures  were  used  to  generate  two  series  of  ten 

overlapping  samples  (one  series  for  each  model)  with  sample  sizes  of 

n * 100,  200, ...,1000  . Each  Individual  sample  contained  all  of  the 

observations  In  the  preceding  samples  plus  an  additional  100  observations 

(l.e.,  the  first  sample  contained  100  observations,  the  second  sample 

contained  the  100  observations  of  the  first  sample  plus  an  additional  100 

observations,  the  third  sample  contained  the  200  observations  of  the  second 

sample  plus  an  additional  100  observations,  etc.).  Each  observation  was 

defined  by  generating  a uniform  random  number  over  the  Interval  (-1,1) 

for  each  of  the  six  variables  x. ,...,x,  . The  true  probability  associated 

1 o 

with  any  observation  x was  then  determined  by  calculating  ?(!:)  from  model 
equation  (1)  using  the  true  coefficients  for  the  respective  model  being 
considered.  Each  observation  was  defined  as  resulting  In  an  "occurrence" 
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or  "nonoccurrence"  (e.g.,  an  Injury  or  noninjury)  by  generating  a uniform 
random  number  r In  the  Interval  (0,1)  and  defining  y such  that 

1 1 If  P(x)  > r 

^0  If  P(x)  < r . 

Each  sample  was  then  Input  to  a computer  program  for  estimating  the 
coefficients  of  the  logistic  function.  Two  such  programs  were  used.  One, 
adapted  from  a program  developed  at  the  National  Institutes  of  Health, 
uses  the  Walker-Duncan  method  [3]  to  obtain  estimated  coefficients  and 
their  estimated  standard  errors.  The  other  program  developed  by  Jones  [1], 
solves  for  the  maximum  likelihood  estimates.  Both  programs  were  found  co 
yield  the  same  results;  the  primary  difference  Is  In  the  output  provided. 


B.  RESULTS 

In  order  to  evaluate  estimation  accuracy,  the  estimated  coefficient 
values  for  Model  A and  Model  B were  compared  with  the  respective  true 
coefficient  values.  Figure  2 provides  this  comparison,  and  Indicates 
general  convergence  of  the  estimated  values  to  the  true  values.  Conver- 
gence may  be  seen  more  clearly  in  Figures  3 through  16,  which  graph  the 
estimated  coefficients  and  their  estimated  standard  error  as  a function 
of  sample  size.  The  true  coefficient  values  are  shown  as  straight 
horizontal  lines.  It  may  be  observed  that.  In  general,  the  estimated  values 
approximate  the  true  values  more  and  more  closely  as  sample  size  Increases. 
This  provides  a clear  Indication  that  the  estimation  process  works. 

Although  comparison  of  estimated  and  true  coefficients  provides 
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Sample  gg  “ O-®®  Pi  = -0.25  32=0*50  33  = -0.75  34=1.00  gj  = -1.25  g^  - 1.50 
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Figure  2:  Estimated  Coefficients  for  Model  A and  Model  B 
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Figure  8:  Estimated  Beta-5  as  a Function  of  Sample  Size  (Model  A) 
(Estimated  Standard  Error  Also  Shown) 
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Figure  IS;  Estimated  Beta-3  as  a Function  of  Sample  Size  (Model  B) 
(Estimated  Standard  Error  Also  Shown) 


some  measure  of  accuracy,  a more  useful  measure  results  from  a comparison 
of  predicted  and  true  probabilities.  To  provide  a conmon  basis  of  com- 
parison across  sample  sizes,  each  set  of  estimated  coefficients  was  used 
in  conjunction  with  equation  (1)  to  derive  predicted  probabilities  for 
the  first  100  observations.  Each  set  of  estimated  probabilities  was 
paired  with  the  corresponding  set  of  true  probabilities,  and  a linear 
regression  equation  was  fitted  to  the  data  using  a weighted  least  squares 
procedure. 

The  results  are  summarized  in  Figure  17,  which  tabulates  the 
Intercept,  slope  and  estimated  standard  error  about  the  regression  line, 
taking  the  weights  into  account.  In  general,  the  slope  of  the  regrtsFilon 
line  is  near  one,  the  Intercept  is  near  zero,  and  the  estimated  standard 
error  becomes  smaller  as  sample  size  Increases.  The  improvement  in 
prediction  with  Increasing  sample  size  can  be  more  dramatically  seen  by 
comparing  plots  of  estimated  versus  true  probabilities  for  various 
sample  sizes. 

If  the  estimated  probability  prediction  model  were  working  correctly, 
predicted  and  true  probabilities  would  be  expected  to  cluster  about  a 
45*  line  between  (0,0)  and  (1,1),  and  in  fact  they  do.  Furthermore, 
with  Increased  sample  size,  the  clustering  about  the  45*  line  becomes 
tighter.  This  can  be  seen  in  Figures  18  through  23,  which  correspond 
to  sample  sizes  of  100,  500,  and  1000  for  Model  A and  Model  B. 


-25- 


Figure  17:  Intercept,  Slope,  and  Standard  Error  of  Regression  Line 
(Predicted  Probability  versus  True  Probability) 
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Figure  18:  Comparison  of  Predicted  and  True  Probabilities  for 
Sample  Size  of  100  from  Model  A 
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Figure  20:  Comparison  of  Predicted  and  True  Probabilities  for 
Sample  Size  of  1000  from  Model  A 
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Figure  21;  Comparison  of  Predicted  and  True  Probabilities  for 
Sample  Size  of  100  from  Model  B 
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Figure  22:  Comparison  of  Predicted  and  True  Probabilities  for 
Sample  Size  of  500  from  Model  B 
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Figure  23:  Comparison  of  Predicted  and  True  Probabilities  for 
Sample  Size  of  1000  from  Model  B 


IV.  DISCUSSION 
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Based  on  the  Monte  Carlo  study  discussed  In  this  report.  It 
appears  that  the  procedure  used  for  estimating  the  coefficients  of 
the  logistic  model  works  well.  This  procedure  provides  convergence 
to  the  correct  coefficient  values  and  true  probabilities  as  the  sample 
size  Increases.  Thus,  If  Injury  probability  as  a function  of  head 
dynamic  response  variables  may  be  approximated  by  a logistic  function, 
the  estimation  procedure  described  In  this  report  will  yield  a satis- 
factory approximation  to  that  function. 

Nonetheless,  a major  question  remains.  That  question  refers  to 
sample  size  requirements.  Of  course,  larger  samples  tend  to  result 
In  better  estimates.  Also,  from  the  results  of  this  study.  It  can  be 
seen  that  overall  probability  predictions  for  Model  A were  better,  In 
general,  than  those  for  Model  B.  This  Is  not  surprising,  since  It  would 
be  expected  that  the  best  discrimination  would  result  In  a data  region 
where  the  split  between  occurrences  and  nonoccurrences  was  close  to 
50Z-50Z. 

In  general,  then,  no  strong  conclusions  can  be  made  about  required 
sample  size.  However,  It  will  be  noted  that.  In  both  Model  A and  Model  B, 
agreement  between  estimated  and  true  probabilities  Is  reasonable  even 
for  a sample  size  of  100,  particularly  for  low  probability  values. 

Although  there  Is  Interest  In  the  overall  agreement  between  estimated 
and  true  probabilities,  low  probability  values  would  constitute  major 
Interest.  This  Is  because  it  Is  desired  to  exclude  those  dynamic 
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response  conditions  for  which  the  probability  of  injury  is  greater 
than  some  specified  small  value  (for  example,  1%,  5%,  or  10%).  Planned 
future  research  will  explore,  in  detail,  estimation  accuracy  in  regions 
of  low  probabilities. 
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This  report  addresses  the  topic  of  estimation  accuracy  in  the  development 
of  an  empirically-based  logistic  model  for  predicting  Impact  acceleration 
Injury.  Two  items  of  central  Interest  are  the  degree  of  accuracy  which  may  be 
expected  for  predictions  derived  from  a model  and  the  sensitivity  of  such  pre- 
dictions to  sample  size.  A Monte  Carlo  simulation  study  was  undertaken  to 
provide  Information  relating  accuracy  to  sample  size  for  selected  model  con- 
figurations. Two  specific  sets  of  model  parameters  were  considered,  Monte  - — 
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^Carlo  samples  of  various  sizes  were  generated  for  each,  and  the  accuracy  of 
the  resultant  predictions  were  evaluated  with  respect  to  the  true  probabil- 
ities. 

Based  on  this  Monte  Carlo  study,  it  appears  that  the  procedure  used  for 
estimating  the  coefficients  of  the  logistic  model  works  well.  This  procedure 
provides  convergence  to  the  correct  coefficient  values  and  true  probabilities 
as  the  sample  size  Increases.  Thus,  if  injury  probability  as  a function  of 
head  dynamic  response  variables  may  be  approximated  by  a logistic  function, 
the  estimation  procedure  described  in  this  report  will  yield  a satisfactory 
approximation  to  that  function. 
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